Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Algorithms for Binary Neural Networks

Algorithm 2 Discrete backpropagation via projection

Input:

The training dataset; the full-precision kernels C; the projection matrix W; the learning rates

η1 and η2.

Output:

The binary or ternary PCNNs are based on the updated C and W.

1: Initialize C and W randomly;

2: repeat

// Forward propagation

for l = 1 to L do

ˆC^l

i,j ^←^P⁽^{W, C}^l

i^{); // using Eq. 3.43 (binary) or Eq. 3.59 (ternary)}

D^l

i ^←^Concatenate^{( ˆ}^C^i,j^{); // using Eq. 3.45}

Perform activation binarization; //using the sign function

Traditional 2D convolution; // using Eq. 3.46, 3.47 and 3.48

end for,

10:

Calculate cross-entropy loss LS;

11:

// Backward propagation

12:

Compute δ ˆ

C^l

i,j ⁼

∂LS

∂^ˆ

C^l

i,j ^;

13:

for l = L to 1 do

14:

// Calculate the gradients

15:

calculate δCl

i^{; // using Eq. 3.49, 3.51 and 3.52}

16:

calculate δW l

j ^{; // using Eq. 3.115, 3.116 and 3.56}

17:

// Update the parameters

18:

C^l

i ^←^C^l

i ⁻^η¹^δC^l

i^{; // Eq. 3.50}

19:

W ^l

j ^←^W^l

j ⁻^η²^δW ^l

j ^{; //Eq. 3.54}

20:

end for

21:

Adjust the learning rates η1 and η2.

22: until the network converges

We believe that compressed ternary CNNs such as TTN [299] and TWN [130] have

better initialization states for binary CNNs. Theoretically, the performance of models with

ternary weights is slightly better than those with binary weights and far worse than those

of real-valued ones. Still, they provide an excellent initialization state for 1-bit CNNs in

our proposed progressive optimization framework. Subsequent experiments show that our

PCNNs trained from a progressive optimization strategy perform better than those from

scratch, even better than the ternary PCNNs from scratch.

The discrete set for ternary weights is a special case, deﬁned as Ω := {a1, a2, a3}. We

further require a1 = −a3 = Δ as Eq. 3.57 and a2 = 0 to be hardware friendly [130].

Regarding the threshold for ternary weights, we follow the choice made in [229] as

Δ^l= σ × E(|C^l|) ≈^σ

∥C^l

i^∥¹

(3.58)

where σ is a constant factor for all layers. Note that [229] applies to Eq. 3.58 on convolutional

inputs or feature maps; we ﬁnd it appropriate in convolutional weights as well. Consequently,

we redeﬁne the projection in Eq. 3.29 as

PΩ(ω, x) = arg min

ai ^∥^ω^◦^x⁻²^aⁱ^∥^{, i}^∈{¹^{, ..., U}^}^.

(3.59)

In our proposed progressive optimization framework, the PCNNs with ternary weights

(ternary PCNNs) are ﬁrst trained from scratch and then served as pre-trained models to

progressively ﬁne-tune the PCNNs with binary weights (binary PCNNs).